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Abstract 

In this paper, we propose a novel inference method for dynamic genetic net- 
works which makes it possible to deal with a number of time measurements 
n much smaller than the number of genes p. The approach is based on the 
concept of low order conditional dependence graph which we extend here to 
the case of Dynamic Bayesian Networks. Most of our results are based on the 
theory of graphical models associated with Directed Acyclic Graphs (DAGs). 
In this way, we define a DAG Q which describes exactly the full order con- 
ditional dependencies given the past of the process. Then, to cope with the 
large p and small n estimation case, we propose to approximate DAG Q by 
considering low order conditional independencies. We introduce partial q th or- 
der conditional dependence DAGs and analyze their probabilistic properties. 
In general, DAGs differ from Q but still reflect relevant dependence facts 
for sparse networks such as genetic networks. By using this approximation, 
we set out a non-Bayesian inference method and demonstrate the effectiveness 
of this approach on both simulated and real data analysis. The inference pro- 
cedure is implemented in the R package 'G1DBN' which is available from the 
CRAN archive. 

Keywords: conditional independence, Dynamic Bayesian Network, Di- 
rected Acyclic Graph, networks inference, time series modelling. 
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Introduction 



The development of microarray technology allows to simultaneously measure 
the expression levels of many genes at a precise time point. Thus it has become 
possible to observe gene expression levels across a whole process such as the 
cell cycle or response to radiation or different treatments. The objective is 
now to recover gene regulation phenomena from this data. We are looking for 
simple relationships such as "gene i activates gene j" . But we also want to 
capture more complex scenarios such as auto-regulations, feed-forward loops, 
multi-component loops... as described by Lee et al. [21] in the case of the 
transcriptional regulatory network of the yeast Saccharomyces cerevisiae. 

To such an aim, we both need to accurately take into account temporal 
dependencies and to deal with the dimension of the problem when the number 
p of observed genes is much higher than the number n of observation time 
points. Moreover we know that most of the genes whose expression has been 
monitored using microarrays are not taking part in the temporal evolution of 
the system. So we want to determine the few 'active' genes that are involved 
in the regulatory machinery, as well as the relationships between them. In 
short, we want to infer a network representing the dependence relationships 
which govern a system composed of several agents from the observation of 
their activity across short time series. 

Static Modelling Such gene networks were first described using static mod- 
elling and mainly non oriented networks. One of the first tools used to de- 
scribe interactions between genes is the relevance network [5] or correlation 
network [36]. Better known as the covariance graph [7] in graphical mod- 
els theory, this undirected graph describes the pair-wise correlation between 
genes. Its topology is derived from the covariance matrix between the gene 
expression levels; an undirected edge is drawn between two variables whenever 
they are correlated. However, the correlation between two variables may be 
caused by linkage with other variables. This creates spurious edges due to 
indirect dependence relationships. 

Consequently, there has been great interest in the concentration graph [20], 
also called the covariance selection model, which describes the conditional de- 
pendence structure between gene expression using Graphical Gaussian Models 
(GGMs). Let Y = (Y i ) i<i< p be a multivariate Gaussian vector represent- 
ing the expression levels of p genes. An undirected edge is drawn between 
two variables Y l and Y^ whenever they are conditionally dependent given the 
remaining variables (See Figure IB). The standard theory of estimation in 
GGMs [20, 46] can be exploited only when the number of measurements n is 
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Figure 1: (A) A biological regulation motif. (B) The concentration graph 
corresponding to the motif A. For all i > 3, Y % is a Gaussian variable repre- 
senting the expression level of gene G l . Some cycles cannot be represented on 
the concentration graph. (C) Dynamic network equivalent to the regulation 
motif A. Each vertex X\ represents the expression level of gene G % at time t. 
This graph is acyclic and allows to define a Bayesian network. 

much higher than the number of variables p. This ensures that the sample 
covariance matrix is positive definite with probability one. However, in most 
microarray gene expression datasets, we have to cope with the opposite situa- 
tion (n << p). Thus, the growing interest in "small n, large p" furthered the 
development of numerous alternatives (Schafer and Strimmer [31, 32] , Wad- 
dell and Kishino [44, 43], Toh and Horimoto [40, 41], Wu et al. [50], Wang et 
al. [45]). Even though concentration graphs allow to point out some depen- 
dence relationships between genes, they do not offer an accurate description 
of the interactions. Firstly, no direction is given to the interactions. Secondly, 
some motifs containing cycles as in Figure 1A cannot be properly represented. 

Contrary to the previous undirected graphs, Bayesian networks (BNs) [13] 
model directed relationships. Based on a probabilistic measure, a BN repre- 
sentation of a model is defined by a Directed Acyclic Graph (DAG) and the 
set of conditional probability distributions of each variable given its parents in 
the DAG [28]. The theory of graphical models [46, 9, 20] then allows to derive 
conditional independencies from this DAG. However, the acyclicity constraint 
in static BNs is a serious restriction given the expected structure of genetic 
networks. 
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Dynamic Bayesian networks This limitation can be overcome by employ- 
ing Dynamic Bayesian networks (DBNs) introduced for the analysis of gene 
expression time series by Friedman et al. [14] and Murphy and Mian [25]. 
In DBNs, a gene is no longer represented by a single vertex but by as many 
vertices as time points in the experiment. A dynamic network (Figure 1C) 
can then be obtained by unfolding in time the initial cyclic motif in Figure 
1A. The direction according to time guarantees the acyclicity of this dynamic 
network and consequently allows to define a Bayesian network. The nature 
of the relationships (positive/negative) does not appear in this DAG but is 
derived from estimates of the model parameters. 

The very high number p of genes simultaneously observed raises a dimen- 
sion problem. Moreover, a large majority of time series gene expression data 
contain no or very few repeated measurements of the expression level of the 
same gene at a given time. Hence, we assume that the process is homoge- 
neous across time. This means that the system is considered to be governed 
by the same rules during the whole experiment. Consequently, the temporal 
dependencies are homogeneous: any edge is present or absent during the whole 
process. This is a strong assumption which is not necessarily satisfied. Nev- 
ertheless, this condition is necessary to carry out estimation unless we have 
several measurements of each gene expression at each time point. 

Up to now, various DBN representations based on different probabilis- 
tic models have been proposed (discrete models [26, 51], multivariate auto- 
regressive process [27], State Space or Hidden Markov Models [29, 49, 30, 3], 
nonparametric additive regression model [16, 17, 19, 37]). See also Kim et al. 
[18] for a review of such models. Faced with so much diversity, we introduce 
in this paper sufficient conditions for a model to admit a DBN representation 
and we set out a concrete interpretation in terms of dependencies between 
variables by using the theory of graphical models for DAGs. 

Our DBN representation is based on a DAG Q (e.g. like the DAG of 
Fig. 1C) which describes exactly the full order conditional dependencies given 
all the remaining past variables (See Section 1). This approach extends the 
principle of the concentration graph showing conditional independencies to the 
dynamic case. 

Dimension reduction Even under the assumption of homogeneity, which 
enables to use the pairs of successive time point gene expression as repeated 
measurements, we have to deal with the "curse of dimensionality" when in- 
ferring the structure of DAG Q. The difficulty lies in coping with the large p 
and small n estimation case. Several inference methods have been proposed 
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for the estimation of the topology of the DAG defining the various DBNs 
quoted above. To name a few, Murphy [24] implemented several Bayesian 
structure learning procedures for dynamic models in the Matlab package BNT 
(Bayes Net Toolbox); Ong et al. [26] reduce the dimension of the problem by 
considering prior knowledge; Perrin et al. [29] use an extension of the linear 
regression; Wu et al. [49] use factor analysis and Beal et al. [3] develop a 
variational Bayesian method; Zou and Conzen [51] limit potential regulators 
to the genes with either earlier or simultaneous expression changes and esti- 
mate the transcription time lag; Opgen-Rhein and Strimmer [27] proposed a 
model selection procedure based on an analytic shrinkage approach. However, 
a powerful approach based on the consideration of zero- and first-order con- 
ditional independencies to model concentration graphs has gained attention. 
When n « p, Wille et al. [48, 47] propose to approximate the concentration 
graph by the graph Qo-i describing zero- and first-order conditional indepen- 
dence. An edge between the variables Y l and is drawn in the graph C/o-i if 
and only if, zero- and first-order correlations between these two variables both 
differ from zero, that is, if 

r(Y\Y j )^0 and VA; e {1, ...,p}\{i,j}, r(Y\ Y j \Y k ) ^ 0, (1) 

where r(Y\ Y J \Y k ) is the partial correlation between Y % and Y^ given Y k . 
Hence, whenever the correlation between two variables Y % and can be 
entirely explained by the effect of some variable Y k , no edge is drawn between 
them. 

This procedure allows a drastic dimension reduction: by using first order 
conditional correlations, estimation can be carried out accurately even with 
a small number of observations. Even if the graph of zero- and first-order 
conditional independence differs from the concentration graph in general, it 
still reflects some measure of conditional independence. Wille et al. show 
through simulations that the graph C?o-i offers a good approximation of sparse 
concentration graphs and demonstrate that both graphs coincide exactly if 
the concentration graph is a forest ([47], Corollary 1). This approach has also 
been used by Magwene and Kim [22] and de la Fuente et al. [8] for estimating 
undirected gene networks from microarray gene expression of the yeast Saccha- 
romyces cerevisiae. Castelo and Roverato [6] investigate such undirected q th 
order partial independence graphs for q > 1 and present a thorough analysis 
of their properties. In this paper, we extend this approach by defining q th or- 
der order conditional dependence DAGs for DBN representations. Then, 
by basing our results on these low order conditional dependence DAGs, we 
propose a novel inference method for dynamic genetic networks which makes 
it possible to deal with the "small n, large p" problem. 
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Table 1: Notations 



p = 


{l<i<p} 


set of the observed genes, 


Pi = 


p\{i} 


set of the observed genes except gene i, 


N = 


{1< t < n} 


set of observation times, 


X = 


{Xi;ieP,teN} 


stochastic process (gene expression time series), 


Q = 




a DAG whose vertices are defined by X and 






edges by E(Q) CXxX, 


Q 




the "true" DAG describing the set of 






full order conditional dependencies, 






qth orc i er conditional dependence DAG, 



The remainder of the paper is organized as follows. In Section 1, we provide 
sufficient conditions for a DBN modelling of time series describing temporal 
dependencies. In particular, we show the existence of a minimal DAG Q which 
allows such a DBN representation. To reduce the dimension of the estimation 
of the topology of Q, we propose to approximate Q by q th order conditional 
dependence DAGs and analyze their probabilistic properties in Section 2. 
From conditions on the topology of Q and the faithfulness assumption, we 
establish inclusion relationships between both DAGs Q and Q^ q > . In Section 3, 
we exploit our results on DAGs 

Finally, validation is obtained on both simulated and real data in Section 4. 
We use our inference procedure for the analysis of two microarray time course 
data sets: the Spellman's yeast cell cycle data [34] and the diurnal cycle data 
on the starch metabolism of Arabidopsis Thaliana collected by Smith et al. 
[33]. 

1 A minimal DBN representation 

Let P — {1 < i < p} describe the set of observed genes and iV = {l < t < n} 
the set of observation times. In this paper, we consider a discrete-time stochastic 
process X = {XI; i G P, t G iV} taking real values and assume the joint 
probability distribution P of the process X has density / with respect to 
Lebesgue measure on M pxn . We denote by X t = {X\\ i G P} the set of the p 
random variables observed at time t and X\-t = {X l s ; i G P, s < t} the set of 
the random variables observed before time t. 

The main result of this section is set out in Proposition 3; we show that 
process X admits a DBN representation according to a minimal DAG Q whose 
edges describe exactly the set of direct dependencies between successive vari- 
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ables X\_ Xl X\ given the past of the process. For an illustration, the minimal 
DAG Q is given in the case of an AR(1) model in Subsection 1.2. Most of 
our results are derived from the theory of graphical models associated with 
DAGs [20]. Note that, even though we need to consider a homogeneous DBN 
for the inference of gene interaction networks, the theoretical results introduced 
in Sections 1 and 2 are valid without assuming homogeneity across time. 

1.1 Background 

Theory of graphical models associated with DAGs Let Q = (X, E{Q)) 
be a DAG whose vertices are the variables X = {X l t \ i e P,t 6 N} and whose 
set of edges E(Q) is a subset of X x X. We quickly recall here elements of 
the theory of graphical models associated with DAGs [20]. A characteriza- 
tion of a Bayesian Network (BN) representation for a process X is given in 
Proposition 1. 

Definition 1 (Parents, Lauritzen [20]) The parents of a vertex X\ in Q, 
denoted by pa(X l t ,Q), are the variables having an edge pointing towards the 
vertex X\ in Q , 

pa(Xi,g) := {X j s such that (X J s ,X l t ) G E(Q);j £?,s£ N}. 

Proposition 1 (BN representation, Pearl [28]) The probability distri- 
bution P of process X admits a Bayesian Network (BN) representation accord- 
ing to DAG Q whenever its density f factorizes as a product of the conditional 
density of each variable X\ given its parents in Q , 

f(x) = Hilf(x l t \ P a(xig)). 

i&PteN 

Throughout this paper, a central notion is that of conditional independence 
of random variables. Two random variables U and V are conditionally inde- 
pendent given a third variable W (and we write U X V \ W) if they are 
independent in the joint probability distribution ^uy,w of the three random 
variables (U, V, W). In other words, U and V are conditionally independent 
given W if for any possible value w of W, variables U and V are independent 
given the variable W — w. This result generalizes to disjoint sets of variables. 
Such conditional independence relationships can be obtained from a BN repre- 
sentation by using graphical theory associated with DAGs, which is essentially 
based on the directed global Markov property recalled in Proposition 2. 
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Figure 2: (A) Moral graph of the DAG in Figure 1C. For all t > 1, the parents 
of the variable X\ are 'married', that is connected by an undirected edge. 
(B) Moral graph of the smallest ancestral set containing the variables Xh x , 
its parents in the DAG in Figure 1C and Xf. As the set (X^,X^) blocks all 
paths between X? and X£ +1 , thus {Xl,Xf} separates X^from Xf and we 
haveA7 +1 XX t 3 | {X},X*). 

Definition 2 (Moral graph, Lauritzen [20]) The moral graph Q m of DAG 
Q is obtained from Q by first 'marrying' the parents (draw an undirected edge 
between each pair of parents of each variable X\) and then deleting the direc- 
tions of the original edges of Q . For an illustration, Figure 2 A displays the 
moral graph of the DAG in Figure 1C. 

Definition 3 (Ancestral set, Lauritzen [20]) The subset S is ancestral 
if and only if, for all a G S , the parents of a satisfy pa(a, Q) C S . Hence, for 
any subset S of vertices, there is a smallest ancestral set containing S which 
is denoted by An(S). Then Qau{s) refers to the graph of the smallest ancestral 
set An(S). See Figure 2B for an illustration. 

Proposition 2 (Directed global Markov property, Lauritzen [20], 
Corollary 3.23) Let P admit a BN representation according to Q. Then, 

E IF | S, 

whenever all paths from E to F intersect S in {QAn(EuFus)) m > the moral graph 
of the smallest ancestral set containing E U F U S. We say that S separates 
E from F. 
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Sufficient conditions for DBNs representation We recall here sufficient 
conditions under which the probability distribution P of process X admits a 
BN representation according to a dynamic network (e.g. in Figure 1C). We first 
assume that the observed process X t is first-order Markovian (Assumption 1). 
That is, the expression level of a gene at a given time t only depends on the 
past through the gene expression levels observed at the previous time t — 1. 
Then we assume that the variables observed simultaneously are conditionally 
independent given the past of the process (Assumption 2). In other words, 
we consider that time measurements are close enough so that gene expression 
level X\ measured at time t is better explained by the previous time expression 
levels X t _i than by some current expression level X\. 

Assumption 1 The stochastic process X t is first- order Markovian, that is, 

Vt>3, X t ±X 1:t _ 2 | AVi- 

Assumption 2 For allt > 1, the random variables {X\}i & p are conditionally 
independent given the past of the process X\.t-i, that is, 

vt>i,Vz^j, xi±xi I X 1:t _!. 

Assumptions 1 and 2 allow the existence of a DBN representation of the 
distribution P according to DAG Gfidi = (X, {{Xl_ x , ^t)}*,j'eP,t>i) 

which contains all the edges pointing out from a variable observed at some 
time t — 1 towards a variable observed at the next time t (See Lemma 1 in 
Appendix A.l). The direction of the edges according to time guarantees the 
acyclicity of Q fuU . 

1.2 Minimal DAG Q 

Existence and definition Among the DAGs included in Qf u u, we show 
that the probability distribution P factorizes according to a minimal DAG, 
which we denote by Q (See Lemma 2, Appendix A.l). The set of edges of Q 
is exactly the set of full order conditional dependencies between successive 
variables given the past of the process as set up in the Proposition 3 (See 
Proof in Appendix A. 2). 
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Proposition 3 (Existence of minimal DAG Q, the smallest subgraph 
of Qfuu allowing DBN modelling) Let Pj = P\{j} and X t 3 = {X^; k E Pj} 
refer to the set Pj of p — 1 variables observed at time t. Whenever Assump- 
tions 1 and 2 are satisfied, the probability distribution P admits a DBN rep- 
resentation according to DAG Q whose edges describe exactly the full order 

conditional dependencies between successive variables X\_ x and X\ given the 

p. 

remaining variables X t l^ observed at time t — 1, 

Q = {(xuxt)- x\ x xU\xZx\. . p ) 

\ l> J l,j£P,t£N J 

Moreover, DAG Q is the smallest subgraph ofGfuii according to which P admits 
a DBN representation. 

Thus in DAG Q, the set of parents pa(Xl, Q) of a variable X\ is the smallest 
subset of X t -\ such that the conditional densities satisfy f{X\\pa{X\, Q)) = 
f(Xl\Xt-i). The set of parents of a variable can be seen as the only variables 
on which this variable depends directly. So Q is the DAG we want to infer in 
order to recover potential regulation relationships from gene expression time 
series. From Proposition 3, any pair of successive variables {Xl_ l ,X l t ) which 
are non adjacent in Q are conditionally independent given the parents of X\. 
In short, for all i,j in P, for all t > 1, we have, 

(XUXi) i E(Q) & X\ X X{_ x | pa{Xl Q). 

We will make use of this result in Section 2 in order to define low order con- 
ditional dependence DAGs for the inference of Q. 

Minimal DAG Q for an AR(1) process Consider the following first order 
auto-regressive model (AR(1)) with a diagonal error covariance matrix E, 

Xi~JV(//i,Ei) (2) 

Vi>l, X t = AX t _, + B + e t , £i~A/"(0,£), (3) 

Vs,t G A, Cov{e u e s ) = 5 ts Z, (4) 

Vs>t, Cov(X t ,e s ) = 0. (5) 

where ^4=(aij)i<i<p,i<j< P is a real matrix of size pxp, B = (bi)i<i< p is a real 
column vector, E = Diag (o~%) 1<i<p is the diagonal error covariance matrix of 
size pxp and for all s,t in N, 5t s =i{ s =t}- Equation (5) implies that the coefficient 
matrices are uniquely determined from the covariance function of X t . 
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This modelling assumes homogeneity across time (constant matrix A) and 
linearity of the dependency relationships. From (3) and (5), the model is 
first order Markovian (Assumption 1). From (4), Assumption 2 is satisfied 
whenever the error covariance matrix E is diagonal. Thus from Proposition 3, 
the probability distribution of the AR(1) process defined by equations (2-5) 
factorizes according to the minimal DAG Qar{i) whose edges correspond to 
the non-zero coefficients of matrix A. Indeed, if matrix E is diagonal, each 
element is the regression coefficient of the variable X\ on X\_ x given Xfj_ l7 
that is 

ay = Cov{XIXU | X^/VarixU \ 

As process X is Gaussian, the set of null coefficients of matrix A exactly 
describes the conditional independencies between successive variables, thus 
if E is diagonal, we have, 

ay = <=> Vt > 1, X\ X X{_ x \Xti v 

Finally, DAG Gar{i) has an edge between two successive variables X\_ x and 
XI, for all t > 1, whenever the coefficient of the matrix A differs from zero, 

Qaro) : = ( X > {( X L, X i) such that a iJ + 0; t > 1, i,j e P}) . (6) 

As an illustration, any AR(1) process whose matrix E is diagonal and matrix A 
has the following form, 



/ an ai2 \ 



A 



a 2 i 
\ a 32 / 

admits a BN representation according to the dynamic network of Fig.lC (p=3). 



2 Introducing q th order dependence DAGs Q^ q > 
for DBNs 

In this paper, we propose to use the DBN modelling according to DAG Q 
(introduced in Proposition 3) to model genetic regulatory networks from gene 
expression time series. Reverse discovery of DAG Q requires to determine, for 

each variable X\, the set of variables X\_ x observed at time t — 1 on which 

in- 
variable X\ is conditionally dependent given the remaining variables X t l v 

However, even under the time homogeneity assumption discussed in the intro- 
duction, standard estimation methods do not allow us to infer the parameters 
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of a regression model for p genes (i.e. p 2 possible edges) from np measure- 
ments. We still have to face the 'curse of dimensionality' since the number of 
genes p, is much higher than the number of measurements n. 

In order to reduce the dimension, we approximate DAG Q by q th order 
conditional dependence DAGs Q^ q ' (q < p). To such an end, we extend to 
DBNs the approach based on the consideration of low order independencies 
introduced by Wille et al.[48, 47] for GGM approximation (See more details 
on low order independence graphs for GGMs in Section ). After defining 
qth orc i er conditional dependence DAGs Q( q > for DBNs, we investigate the 
manner in which they allow us to approximate the DAG Q describing full 
order conditional dependencies. 

2.1 DAG definition 

Let q be smaller than p. In the q th order dependence DAG Q^ q \ whenever 

o p 
there exists a subset X^_ 1 of q variables among the set of p — 1 variables X t ^_ x 

such that X\_ x and X\ are conditionally independent given Xf_ x , no edge is 

drawn between the two successive variables X{_ x and X\. In short, DAGs 

are defined as follows, 

Definition 4 q th -order conditional dependence DAG Q( q ' 

Vg<p, G (q) =(x, UXUXI)- VQ C P h \Q\ = q,X\ JL xU\X?-i} )■ 

DAGs G {q) offer a way of producing dependence relationships between the 
variables, but they are no longer associated with a BN representation which 
would call for more global relationships. Note that the definition of q th order 
partial dependence DAG is based on exact q th order independencies (not 
on all partial independencies lower than q as in the partial order correlation 
network used by Wille and Biihlmann [47]). Indeed, we consider that including 
only the q th order dependencies better reflects the true DAG Q. In particular, 
for p variables, DAG is DAG Q. This definition is possible for DBNs 

because dynamic modelling essentially differs from static correlation network 
modelling 1 . 

1 In particular, contrary to the case of correlation network, the " V " structures (or 
structures with multiple parents) do not generate spurious edges in the case of DBN since 
the definition of the DAG Q defining full order dependencies docs not allow edges between 
variables observed at the same time. Thus, for instance, when considering the following 
" V " structure X\_ x — > X\ <— X^_ 1} no spurious edge can be inferred between the variables 

XU andX t fe _!. 
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Figure 3: First-order conditional dependence DAG (obtained from the 
DAG in Figure 1C). The spurious dashed arrow may appear in Q^. 

In general, DAGs differ from DAG Q. For instance, the approximation 
of the DAG of Figure 1C by the I s * order conditional dependence DAG may 
give rise to the spurious edge Xf —> X^ +1 , for all t < n (See Figure 3). Indeed, 
X\ (resp. X t 2 ) does not separate X^ +1 from Xf in the smallest moral graph 
containing the variables X} +1 U Xf U X\ (resp. Xh x U Xf U Xf) displayed in 
Figure 2B. Nevertheless, if the vertices of Q have few parents, DAGs Q^ 9 ' bring 
relevant information about the topology of Q, even for small values of q. In 
the following, we give characterizations of low order conditional dependence 
DAGs and analyze the accuracy of the approximations they offer. 

2.2 A restricted number of parents 

In some known gene regulation mechanisms, it is the case that a few genes 
regulate many other genes (e.g. the single input modules in the transcriptional 
regulatory network of S. Cerevisiae [21]). However, we do not expect a single 
gene to be regulated by many genes at the same time. So the number of 
parents in gene interaction networks is expected to be relatively small. In this 
section, we analyze the properties of when the number of parents in Q is 
lower than q. 

Let us denote by N pa (Xl, Q) the number of parents of X\ in DAG Q and 
N^ ax (Q) the maximal number of parents of any variable X\ in Q, 

N pa (Xl Q) = pa(Xi, Q) , N%r(g) = . Max (iVpaPQ, S)) ■ 

The next results hold when the number of parents in Q is restricted. 
Proposition 4 If N pa (X l t1 Q) < q then we have, 

{(xUxi)^E(g)} {{xUxi)£E{g 9 )}. 

Corollary 1 For all q > N^ ax (g), we have Q D Q^ q) . 

Proposition 5 Let X he a Gaussian process. If N^ a ax (Q) < 1 then Q — Q^i. 
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Consider a variable X\ having at most q parents in Q (q < p). Let X\_ x 
be a variable observed at the previous time t — 1 and having no edge pointing 
towards X\ in Q. In the moral graph of the smallest ancestral set containing 
X\ U X\_ x U pa(A], Q), the set of parents pa(A], Q) separates X\ from X\_ x . 
From Proposition 2, we have X\ X | pa(A7,C?). The number of parents 

pa(X f *,(J7) is smaller than q, so the edge X\_ x — > X\ is not in Q( q K This 
establishes Proposition 4. Consequently, if the maximal number of parents in 
Q is lower than q, then Q^ q > is included in Q (Corollary 1). In this case, 
does not contain spurious edges. 

The converse inclusion relationship is not true in general 2 . Nevertheless, if 
each variable has at most one parent, the converse inclusion Q C Q( l > is true if 
the process is Gaussian and q = 1 (Proposition 5, see proof in Appendix A. 2). 
At a higher order, we need to assume that all conditional independencies can 
be derived from Q, that is P is faithful to Q. 

2.3 Faithfulness 

Definition 5 (faithfulness, Spirtes [35]) A distribution P is faithful to 

a DAG Q if all and only the independence relationships true in P are entailed 
by Q (as set up in Proposition 2). 

Theorem 1 (Measure zero for unfaithful Gaussian (Spirtes [35]) 
and discrete (Meek [23]) distributions) Let i\g (resp. Tig) be the set 
of linearly independent parameters needed to parameterize a multivariate nor- 
mal distribution (resp. discrete distribution) P which admits a factorization 
according to a DAG Q. The set of distributions which are unfaithful to Q has 
measure zero with respect to Lebesgue measure over Wg (resp. over Kg). 

From Definition 5, whenever the distribution P is faithful to Q, any subset 
Q X t _i, with respect to which X\ and X{_ x are conditionally indepen- 
dent, separates X\ and X\_ x in the moral graph of the smallest ancestral set 
containing X\ U X{_ x U X^_ 1 . Under this assumption, we can derive interest- 
ing properties on Q from the topology of low order dependence DAGs Q^ q \ 

2 As an illustration, let X\_ x — >X\ be an edge of Q then in essence (See Prop 3) X\ and 
X{_ 1 are conditionally dependent given the remaining variables xj^i- There may however 
exist a subset of q variables X^_ l , where Q is a subset of P\{j} of size q, such that X\ and 
X\_ x are conditionally independent with respect to this subset Xf_ x . Indeed, even though 
the topology of Q allows us to establish some conditional independencies, DAG Q docs not 
necessarily allow to derive all of them. Two variables can be conditionally independent 
given a subset of variables whereas this subset does not separate these two variables in Q. 
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As there is no way to probability distribution to be faithful to a 

DAG, this assumption has often been criticized. However, Theorem 1, estab- 
lished by Spirtes [35] for the Gaussian distribution and extended to discrete 
distributions by Meek [23], makes this assumption reasonable at least in a 
measure-theoretic sense. Moreover this assumption remains very reasonable 
in a modelling framework where the network to be inferred describes actual 
interaction relationships. The next propositions are derived from the faithful- 
ness of the distribution P to Q (See proofs in Appendix A. 2). 

Proposition 6 Assume P is faithful to Q. For all q < p, we have Q C Q^ q \ 

Corollary 2 Assume P is faithful to Q . For all q > N^ a ax {Q), we have Q = Q^ q \ 

Proposition 7 Assume P is faithful to Q . 

IfN pa (X},gM)<qthen (XU, Xf) e E{Q®) (XU, Xf) e E{Q). 

Corollary 3 Assume P is faithful to Q. For all q > N^ ax (Q^), Q = 

Whenever P is faithful to Q, DAG G {q) contains DAG Q (Proposition 6). 
Even though we expect the number of parents in a gene interaction networks 
to be bounded aboce, the exact maximal number of parents N^ ax (Q) remains 
mostly unknown. However, we show that the edges of DAG pointing to- 
wards a variable having less than q parents in Q( q > are edges of Q too (Propo- 
sition 7). Thus, if P is faithful to Q, knowledge of the topology of DAG 
only allows us to ascertain some edges of DAG Q. From Propositions 6 and 7, 
we establish that both DAG and DAG Q exactly coincide if any node of 
Q( q > has less than q parents (Corollary 3). 

3 G1DBN, a procedure for DBN inference 

We introduced and characterized the q th order dependence DAGs G^ q \ for 
all q < p, for dynamic modelling. We now exploit our results to develop 
a non-Bayesian inference method for DAG Q defining a DBN representation 
for process X. Let q ma x be the maximal number of parents in Q. From 
Corollary 3, inferring Q amounts to inferring ^' ra ™'. However, the inference 
of g(i ma *) requires to check, for each pair if there exists a subset Q C Pj of 
dimension q ma x such that X\ X X^_ 1 \X^_ 1 for all t > 1. So, for each pair 
there are 

(p™r) potential sets that can lead to conditional independence. To 
test each conditional independence given any possible subset of q max variables 
is questionable both in terms of complexity and multiple testings. 
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To circumvent these issues, we propose to exploit the fact that the true 
DAG Q is a subgraph of C/W (Proposition 6) in order to develop an inference 
procedure for Q. Indeed, the inference of Q^> is both faster (complexity) and 
more accurate (number of tests). Thus we introduce a 2 step-procedure for 
DBN inference. In the first step, we infer the I s * order dependence DAG 
then we infer DAG Q from the estimated DAG Q^ 1 ' . This 2 step-procedure, 
summarized in Figure 4, is implemented in a R package 'G1DBN' [1] freely 
available from the Comprehensive R Archive Network. 

3.1 Step 1: inferring 

We evaluate the likelihood of an edge (A A ^_ 1 ,X t *) by measuring the conditional 
dependence between the variables X{_ x and X\ given any variable X^_ x . As- 
suming linear dependencies, we consider the partial regression coefficient a^u. 
defined as follows, 

X\ = rriijk + OijikX^ + a ife |jX t fc _ 1 + r)l'^' k , 

where the rank of the matrix (Xl_ ± , X t fc _ 1 ) t > 2 equals 2 and the errors {rf t ,: '' k }t>2 
are centered, have same variance and are not correlated. 

We measure the conditional dependence between the variables X\_ x and X\ 
given any variable X^_ x by testing the null assumption TC^' 1 *: "ayi* = 0". To 
such an aim, we use one out of three M-estimators for this coefficient: either 
the familiar Least Square (LS) estimator, the Huber estimator, or the Tukey 
bisquare (or biweight) estimator. The two latter are robust estimators [12]. 
Then for each k ^ j, we compute the estimates according to one of these 
three estimators and derive the p-value from the standard significance 
test: 

under (H^' k ) : " a ij]k = " , ~ t(n - 4), (7) 

V\P>ij\k) 

where t(n — 4) refers to a student probability distribution with n — 4 degrees 
of freedom and a(a,ij\k) is the variance estimates for a^. 

Thus, we assign a score S\(i,j) to each potential edge (A / "/_ 1 ,X f *) equal 
to the maximum MaXk^j{pij\k) of the p — 1 computed p- values, that is the 
most favorable result to I s * order conditional independence. This procedure 
does not derive p-values for the edges but allows to order the possible edges 
of DAG according to how likely they are. The smallest scores point out 
the most significant edges for The inferred DAG contains the edges 
assigned a score below a chosen threshold 
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Choose either LS,Huber or Tukey estimator and set cty and a 2 
thresholds . 

Step 1: inferring 

For all i e P, 

For all j G P, for all k^j, compute the p-value Pij\k from (7), 
Si(i,j) = Max k ft(pij\k) • 

E(gV>) = X t ')t>i; ». j e such that 5i(i,j) < ai}. 

Step 2: inferring £/ from 
If A r ^ ax (^( 1) ) ~ n - 1, choose a higher threshold a\ and go to Stepl. 
For alii such that N pa {X\, Q^) > 1 , compute the p-value p^ f rom (9). 

f Pi? M a// i,j G P suc/i tfarf (X J t _ v Xt) t>1 G S« 

1 otherwise. 

E(G) = {(Xj^Xpt^-i G P,(i,j) G P such that ff 2 (i,j) < q 2 }. 

Figure 4: Outline of the 2 step-procedure G1DBN for DBN inference. 

3.2 Step 2: inferring from 

We use the inferred DAG as a reduction of the search space. Indeed, from 
faithfulness, we know that Q C (Proposition 6). Moreover, when DAG 
Q is sparse, there are far fewer edges in Q^ 1 ' than in the complete DAG Q^ii 
defined in Section 1.1. Consequently, the number of parents of each variable 
in is much smaller than n. Then model selection can be carried out using 
standard estimation and tests among the edges of Q^ 1 '. For each pair (i, j) such 
that the set of edges (X 3 t _ l: X % t ) t> i is in Q^ x \ we denote by a[f the regression 
coefficient, 

X\ = m l+ a S? X '-i + Vi (8) 

where the rank of the matrix (-Xt-i)t>2,,jepa(.xi,(/W) is \pa(X},gW)\ and the 
errors {t]l}t>2 are centered, have the same variance, and are not correlated. 
We assign to each edge of a score S 2 (i, j) equal to the p-value pf^ derived 
from the significance test, 

.(2) 

under (7#) : " ag> = ", ~ t(n - 1 - \pa(X*, <? (1) ) |). (9) 

) 

The score S%(i, j) = 1 is assigned to the edges that are not in Q^ 1 ' . The smallest 
scores indicate the most significant edges. The inferred DAG for Q contains 
those edges whose score is below a chosen threshold a 2 . 
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When Q is sparse, Step 1 of G1DBN inference procedure gives already a 
good estimation of Q (See Precision- Recall curves obtained for simulated data 
in Figure 5). Even better results can be obtained with the 2 step-procedure 
which requires to tune two parameters a.\ and a^- Parameter a.\ is the selec- 
tion threshold of the edges of Q^> in Step 1 (that is the dimension reduction 
threshold), whereas parameter ct2 is the selection threshold for the edges of Q 
among the edges of DAG Q^ 1 ' . 

3.3 Choice of the thresholds 

The choice of thresholds is often something non trivial, especially when using 
multiple testing. However, Step 1 of the procedure is conservative by construc- 
tion. Indeed, the definition of score Si (equal to the maximum of p — 1 p- values 
computed for testing lst-order conditional independence) clearly supports the 
acceptation of the null assumption, i.e. the absence of an edge. Standard 
approaches for multiple testing correction do not apply to choose ot\ thresh- 
old. Thus we introduce a heuristic approach to choose ot\ threshold which 
is detailed in Supplementary Material [2], Section B. Overall, ct\ threshold is 
chosen so that, after the Step 1, the number of genes having exactly one parent 
in DAG predominates. 

The choice of ct2 threshold is less problematic. Indeed, the second Step of 
the inference procedure is a standard multivariate regression. Then the usual 
thresholds 1%, 5% or 10 % can be chosen or even a lower threshold when a low 
number of edges is wanted. However, a large number of tests are computed (as 
many as edges in DAG Q^). In such multiple testing situations, a set of the 
predictions are expected to be false and it is useful to control this. We control 
the expected proportion of false positives edges, i.e. the False Discovery Rate 
(FDR) with the approach introduced by Benjamini and Hochberg 3 [4]. 

3.4 Complexity of the algorithm 

The complexity of this algorithm is 0(p 3 ). However the scores (Si(i,j)) gP of 
the incoming edges of each target gene % can be computed separately by using 
parallel run. This option is available in the R package G1DBN by specifying 
the target gene i in the function DBNScoreStepl dedicated to the Step 1 
computation. 

3 Let m be the number of remaining edges after Step 1, then Step 2 requires to compute 
m tests. Choose a maximal FDR level q and order the set of m observed p-values: 
P(i) < ••" < P(i) < ••• < P(m)- Then reject the null assumption (Hg . "Edge i is not 
DAG Q") for alii < k where k is defined as follows: k = max {i : pu\ < j^q] . If no such 
i exists, reject no hypothesis. Benjamini and Hochberg (1995) showed that this procedure 
ensures the FDR is lower than q^- < q where mo is the number of true null hypotheses. 
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All the computations were performed on Redhat WS 4 AMD opteron 270 
(2GHz). The computation time mostly depends on the number of TF genes, 
i.e. the genes allowed to be parents in the DAG to be inferred. For an illustra- 
tion based on DBN inference performed from a real data set by Spellman [34] 
containing 786 target genes in Section 4.3, the computation of Step 1 required 
7 minutes when the set of possible TF genes was restricted to 18 genes (resp. 
4 minutes with the lasso [39] and 7 seconds with the shrinkage procedure [27], 
which are two alternative approaches for DBN inference introduced in Sec- 
tion 4.1). When all the 786 genes can be TFs, the computation was parallel 
run and required 19 minutes by target gene with G1DBN (resp. 8 minutes by 
target gene with the lasso and 5 minutes for the whole set of 786 target genes 
with the shrinkage procedure). Step 2 of G1DBN is very quick and requires 
less than 5 seconds for the 786-TF study. Despite the need for more time, in- 
ference with G1DBN for a data set containing 800 genes is fully computable, 
especially when parallel running. 

4 Validation 

4.1 Comparison with two reference methods 

We compare the G1DBN inference procedure with two reference methods for 
model selection for multivariate AR(1) process: the shrinkage approach by 
Opgen-Rhein and Strimmer [27] and the lasso (Least Absolute Shrinkage and 
Selection Operator) introduced by Tibshirani [39]. Opgen-Rhein and Strimmer 
recently proposed a model selection procedure based on an analytic approach 
using James- Stein- Type shrinkage. The procedure consists of first computing 
the partial correlation coefficients, r(Xl,X^_ l \X t ^ l ), from the shrinkage es- 
timates of the partial regression coefficients, and second, selecting the edges 
with a local false discovery rate approach [10]. Shrinkage inference is performed 
using the R code for shrinkage estimation 4 by Opgen-Rhein and Strimmer. 

The lasso (also called LI shrinkage) combines shrinkage and model selec- 
tion. The lasso estimates are obtained by minimizing the residual sum of 
squares subject to the sum of the absolute values of the coefficients being less 
than a constant. This approach offers the advantage that it automatically 
sets many regression coefficients to zero. We performed the lasso with the R 
package LARS developped by Efron et al. [11]. 

4 available at http : //strimmerlab . org/ software . html. 
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4.2 Simulation study 

As the discovery of genetic regulatory interaction is a field in progress, vali- 
dation of predictions made on real gene expression data is only partial, which 
may render the estimation of true and false positive detection rate not fully 
reliable [15]. Thus we first investigate the accuracy of G1DBN, the shrinkage 
and the lasso inference procedures on simulated data. 

Data generation We generated 100 random time series according to a mul- 
tivariate AR(1) model defined by parameters (Atp Xp \,B,T<) for p = 50 genes. 
Since gene regulation networks are sparse, each matrix A contains 5% of non 
zero coefficients. While keeping the number of parents low, this does not 
prevent a vertex from having more than one parent. Non zero regression coef- 
ficients Oy , mean coefficients hi and error variances <7j were drawn from uniform 
distributions {a ih k ~ZY([-0.95; -0.05]U[0.05; 0.95]), cr* ~W[0.03, 0.08]). Time 
series were generated under the corresponding multivariate AR(1) models for 
n = 20 to 50. 

Evaluation based on PR curves We evaluated the performance of DBN 
inference procedures using the Precision-Recall (PR) curve as plotted in Fig- 
ure 5. PR curves show the precision, equal to the Positive Predictive Value 
(PPV) on the ordinate against the recall, equal to the power, on the abscissa. 
PR curves are drawn by first ordering the edges by decreasing significance, 
and then by computing the PPV and power for the first selected edge and for 
each newly included edge successively. We recall the next definitions, 

Positive Predictive Value (PPV) = True Discovery Rate (TDR) 

= 1- False Discovery Rate (FDR) 
TP 

TP + FP 
TP 

Recall = Sensitivity = Power = — — — — 

J TP + FN 

where TP refers to the number of true positive edges, i.e. the number of edges 
which are selected by the inference procedure and actually belongs to the true 
DAG (used for simulating the data); FP refers to the number of false positive 
edges, i.e. the edges which are selected by the procedure but are not in the 
true DAG and FN refers to the number of false negative edges, i. e. the number 
of edges which are not selected by the procedure but are in the true DAG. 
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Figure 5: Precision-Recall (PR) cup^ r obtained for network inference from 
simulated data (n = 20). (A) Comparison of the inference procedures: G1DBN 
(LS or Tukey), shrinkage and lasso. Step 2 of the G1DBN approach drastically 
improves the results (threshold ot\ = 0.7). (B) Impact of noisy data, simulated 
using a non diagonal matrix E with either Gaussian or uniform noise, on the 
G1DBN procedure (Step 2) computed with LS estimates. 
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Simulation results We show on Figure 5 the results obtained with n = 20, 
a length one can expect from existing gene expression time series. Figure 5A 
displays the average Precision Recall (PR) curves obtained with the various 
inference approaches when the error covariance matrix £ is diagonal and the 
noise distribution is Gaussian. The Step 1 of the G1DBN procedure computed 
either with the LS estimator or with the Tukey estimator (dashed lines) gives 
a very high PPV for the very first selected edges. The Step 2 of the G1DBN 
procedure (solid line) drastically improves the results. It allows to maintain 
the PPV greater than 95 % while the power goes up to 50%. PR curves 
computed with the Huber estimates (not shown) led to comparable results. 
The lasso (dotted line) is clearly outperformed by the other approaches and 
the shrinkage approach (dashed-dotted line) gives results comparable to the 
Step 1 of the G1DBN procedure only. The results of the three methods are 
naturally improved for greater values of n but their relative perfomances are 
preserved (curves not shown). 

We investigated the impact of the violation of the model assumptions. First 
we performed DBN inference on simulated data where the error covariance ma- 
trix E is not diagonal (3% of the coefficients outside the diagonal differ from 0) 
and the noise distribution is either Gaussian or uniform (U[—2; 2]). As shown 
on Figure 5B, the accuracy of the G1DBN procedure (Step 2) is not strongly 
affected when these assumptions on the noise distribution are not satisfied. 
However, it is difficult to get rid of the I s * order Markov Assumption which 
was chosen in order to reduce the model dimension. When simulating an 
AR(2) model, the 2-order time dependencies existing in the model are missed. 
However, the 1-order time dependencies existing in the model are still recov- 
ered. Then, when considering a 2 nd order Markov process, an approximation 
can still be performed by successively inferring 1- and 2-order time dependen- 
cies. Note that the procedure also performs well when the number of parents 
in the true DAG Q is greater than one (See Supp. Material [2], Section A). 

4.3 Analysis of microarray time course data sets 

Spellman's Yeast cell cycle data set We performed dynamic network 
inference from the Saccharomyces cerevisiae cell cycle data collected by Spell- 
man et al. [34]. We used the a Factor-based synchronization data (18 time 
points) and we focus here on a set of 786 genes which demonstrated consis- 
tent periodic changes in transcription level (See Supplementary Material [2], 
Section D.l for more details). 
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Figure 6: Some results of the 18 TF-survey of S. cerevisiae cell cycle. 
(A) DAG containing the 18 first selected edges with G1DBN with LS esti- 
mates (PPV=60%). Colored nodes represent the TFs and the dark blue edges 
are validated by the Yeastract database. (B) Percentage of validated edges out 
of the first 5 to 1000 edges inferred with the G1DBN procedure, after Step 2 
or after Step 1 only, the shrinkage or the lasso procedure. The dashed line 
shows the proportion of validated edges out of the 786x18 possible edges. 
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We carried out two surveys on this dataset. First, we allow only a subset 
of 18 genes 5 identified as putative TFs to be possible parent genes (i.e. to 
have edges pointing out towards other genes in DAG Q) and look for their 
target genes. Then we extend the search for parent genes to the whole dataset 
of 786 genes in a second survey. We set a.\ threshold for the G1DBN proce- 
dure according to guidelines detailed in Supplementary Material [2], Section B 
(a i = 0.1 for the 18 TF-survey, a.\ = 0.05 for the 786 TF-survey). 

It is somehow difficult to assert the validity of the results obtained from real 
data as the whole regulatory machinery is not known yet. However the yeast 
cell cycle has been studied a lot and many regulation relationships have been 
recovered. We study the consistency of the first inferred edges with annotations 
in the Yeastract database [38], a curated repository currently listing found 
regulatory associations between TFs and target genes in S. cerevisiae. 

In the 18 TF-survey, the first few selected edges are biologically validated. 
In the DAG comprising the 18 first selected edges (Figure 6A), 11 edges refer 
to identified regulatory relationships (thick blue edges). The first detected TFs 
are the genes coding for proteins FKH2, NDD1, RAP1 and SWI4. In partic- 
ular, the proteins FKH2 (known as a TF with a major role in the expression 
of G2/M phase genes) and SWI4 (TF regulating late Gl-specific transcription 
of targets) are pointed out as being essential TFs; they have the most target 
genes and the high majority (73%) of these regulatory relationships is listed 
in Yeastract. 

As introduced in Section 3.3, we chose «2 threshold in order to keep the 
False Discovery Rate (FDR) smaller than 1% with the approach by Benjamini 
and Hochberg [4]. This lead to 0:2 = 0.0059. The corresponding inferred DAG 
is shown in Figure 7. The two proteins FKH2 and SWI4 are still part of 
the TFs having the most targets, together with NDD1, which is an essential 
component of the activation of the expression of a set of late-S-phase-specific 
genes and TEC1, a transcription factor required for full Tyl expression and 
Tyl-mediated gene activation (Ty transposable-element own for causing cell- 
type-dependent activation of adjacent-gene expression). The set of selected 
TFs is listed in Supplementary Material [2], Section D.2, Table 1, where the 
third column indicates the number of validated edges out of the selected ones. 
Except for NDD1, for which no target gene is listed in yeastract, one forth of 
the targets genes of the top four TFs are validated. 



5 The 18 genes code for proteins ACE2, FKH1, FKH2, GAT3, MBP1, MCM1, MIG2, 
NDD1, PHD1, RAP1, RME1. STB1, SUT1, SWI4, SWI5, SWI6, TEC1 and YOX1. consist 
of the overlap between the 786 genes under study and the 50 genes identified as putative 
TFs in a recent study by Tsai et al. [42]. 
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Figure 7: DAG inferred by G1DBN with LS estimates, using oi\ = 0.1, 
«2 = 0.0059 (ensuring FDR< 0.01), in the 18 TF-survey of the S. cerevisiae 
cell cycle. The 17 colored nodes represent the 16 TFs selected as parent node 
out of the 18 TFs under study, plus node FKH1 which is selected as a target of 
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For a comparative overview, the histogram of Figure 6B displays the per- 
centage of validated edges out of the first 5 to 1000 selected edges inferred 
with each inference procedure When considering the 1000 first inferred edges, 
the results are very similar to what could be expected by chance only. Note 
that, as the Step 2 of G1DBN choose 308 edges only, it is not considered when 
comparing the 1000 first edges. 

In the second survey including all the 786 genes as putative TFs, the di- 
mension is far higher and the results are consequently more restricted. Indeed, 
the proportion of validated edges doesn't exceed 12.5%, obtained with the 2nd 
step of G1DBN procedure among the first selected edges. However, this is still 
a subtantial result as compared with the proportion of validated edges (equal 
to 0.26%). In order to keep the FDR smaller than 0.01, we chose c*2 = 0.0067 
by following the Benjamini and Hochberg approach [4]. The inferred DAG 
for the 786 TF-survey contains 437 genes and 380 edges. The display of this 
DAG, as well as the list of its edges and the list of the genes selected as TFs, 
is available in Supplementary Material [2]. 

Diurnal cycle on the starch metabolism of A. Thaliana We applied 
the G1DBN inference procedure to the expression time series data generated 
by Smith et al. [33] to investigate the impact of the diurnal cycle on the 
starch metabolism of Arabidopsis Thaliana. We restricted our study to the 800 
genes selected by Opgen-Rhein and Strimmer [27] as having periodic expression 
profiles 6 . 

Using the heuristic approach detailed in Supplementary Material [2], Sec- 
tion B, we choose threshold a± = 0.02 allowing the distribution of the number 
of parents in the DAG Q^ 1 ' having the number of 0-parent genes to dominate 
and the number of 1-parent genes to be half as large. We set «2 = 0.005 in 
order to maintain the False Discovery Rate smaller than 0.01 by using the ap- 
proach by Benjamini and Hochberg [4] (See Section 3.3 for details). We recover 
the DAG in Figure 8 which has a "hub" connectivity structure. This network 
contains 206 edges implicating 277 different genes. We may notice that this 
DAG differs from the one inferred by Opgen-Rhein and Strimmer [27]. How- 
ever the edges selected by the three inference procedures discussed in this 
section differ somewhat (See the proportion of edges in common by using the 
various inference approaches in Supplementary Material [2], Section C) and 
may, in fact, yield complementary information or insights. 



6 The data are available in the GeneNet R package at http : / / strimmerlab . org/ 
sof tware/genenet/html/ar th800.html or in our R package G1DBN (arth8001ine). 
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Figure 8: DAG inferred with G1DBN from the data by Smith et al. [33] in 
order to investigate starch metabolism of A. thaliana (LS estimates, a\ = 0.1, 
«2 = 0.005 such that FDR< 0.01). The dark colored nodes are the 3 nodes 
with the most targets, 2 out of them are known for being implicated in starch 
metabolism. The light colored nodes are parent nodes already identified as 



Among the 'parent' nodes in the inferred DAG displayed in Figure 8, two 
nodes (799 and 628) out of the three having the most target refers to proteins 
that are known to be implicated in starch metabolism. Indeed, node 799, 
which has 14 'target 7 nodes, refers to DPE2 (DISPROPORTIONATING EN- 
ZYME 2), which is an essential component of the pathway from starch to su- 
crose and cellular metabolism in plant leaves at night. Node 628 (6 targets) is 
a transferase (At5g24300) implicated in the starch synthase. Node 702, which 
is an unknown protein (At5g58220), has also 6 targets. These three nodes are 
dark-colored in the DAG of Figure 8. Note that there is no prior knowledge 
regarding the role of each gene (TF or target) in this survey. As a consequence, 
some edges might be inferred wrong way around 7 . Thus node 799, which is a 
gene coding for an enzyme (DPE2), is most probably not a TF for its 14 ap- 
parent target genes. However node 799 is still the gene whose expression level 
best explains the expression of the 14 genes. Consequently these genes might 
be implicated in the same pathway as DPE2. The remaining parent nodes 
have from 1 to 4 targets. Among them, 9 genes, which are listed in Supple- 
mentary Material [2], Section E, Table 2, have already been identified as TFs 
or as DNA binding proteins. These 9 nodes are light-colored in the displayed 
DAG. Finally a list of 37 unknown proteins have been selected as parents in 
the inferred DAG. Potentially implicated in the regulation machinery of starch 
metabolism, these proteins represent a subset of genes which is relevant for fur- 
ther analyses. See more details on the inferred network displayed in Figure 8 
in the Supplementary Material [2]. 



5 Discussion and conclusion 

As more and more gene expression time series has become available, the need 
for efficient tools to analyze such data has become imperative. In this paper, we 
first determine sufficient conditions for Dynamic Bayesian Network modelling 
of gene expression time series. This type of modelling offers a straightforward 
interpretation: the edges of the DAG Q defining the DBN exactly describe 
the set of conditional dependencies between successive gene expression levels. 
Having defined and characterized low order conditional dependence DAGs for 
DBNs, we point out relevant characteristics for the approximation of sparse 
DAGs. In particular, under faithfulness assumption, DAG Q is included in the 
1 st order conditional dependence DAG 

7 In particular if some assumption of the model is not satisfied. For instance if an essential 
TF is missing or if the regulation is not transcriptional, i.e. does not depend on the amount 
of mRNA coding for the protein. 
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From these results, we develop G1DBN, a novel procedure for DBN infer- 
ence, which makes it possible to tackle the 'small n, large p' estimation case 
that occurs with genetic time series data. Based on the consideration of low 
order conditional dependencies, the G1DBN procedure proved to be powerful 
on both simulated and real data analysis. With respect to other methods, the 
shrinkage approach considerably improves the precision of the overall estima- 
tion of the partial correlation coefficients when the number of observations n is 
small compared to the number of genes p. However, considering I s * order con- 
ditional independence proved to be more efficient for DBN inference in terms 
of power and PPV on simulated data, and gave promising results on real data 
analysis. As for the lasso, one might notice that a drawback lies in the fact 
that the edge selection is done vertex by vertex whereas the DAG Q is globally 
sparse but not uniformally. As a consequence, the lasso tends to uniformally 
reduce the number of parents of each vertex instead of only keeping the total 
number of edges contained. 

The power of the G1DBN procedure comes from the accuracy improvement 
of the testing made possible by the dimension reduction. Indeed, as the first 
step selection is based on the I s * order conditional independence consideration, 
significance tests are performed in a model of dimension 4 (See Section 3.1). 
This represents a drastic dimension reduction compared to full order indepen- 
dence testing and makes the testing much more accurate. Thus, even if there 
are more edges in the DAG g {1) than in the true DAG Q (Proposition 6), 
Step 1 of the procedure is already very predictive. 

Throughout the analyses performed for this paper, we point out two major 
directions for further research. On the one hand, we noticed that the edges 
selected by the three inference procedures differ somewhat (See Supplementary 
Material [2], Section C). A further relevant study would consist of analyzing in 
which way these DBN inference procedures could have different strenghts and 
may be complementary. On the other hand, the use of robust estimators like 
Huber or Tukey bisquare did not allow a noticeable change of the inference 
approach on real data. Another interesting survey lies in the investigation of 
which measures of dependence, like non linear or other robust estimates, are 
the more pertinent to analyze gene expression data. 
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APPENDIX 



A Proofs 

A.l Lemmas 1 to 3 and proofs 

Lemma 1 Under Assumptions 1 and 2, the probability distribution P admits 
a DBN representation according to a DAG whose edges only join nodes rep- 
resenting variables observed at two successive time points, at least according 
to DAG Qfuu = (X, Xj)}j je p 4>1 ) which has edges between any pair of 

successive variables. 

Proof of Lemma 1. From assumption 1, the density / of the joint 
probability distribution of process X be written as the product of conditional 
densities, 

n 

f(X) = f(X 1 ) HfiXtlx^), (10) 

t=2 

where f{X t \X t -i) refers to the density of the conditional probability distribu- 
tion of X t given X t _i. 

From Assumption 2, for all t > 1, the conditional density f(X t \X t -i) can 
be written as the product of the conditional density of each variable X\ given 
the set of variables X t ~\ observed at the previous time, 

f(X t \X t ^) = l[f(Xl\X t ^). (11) 

From equations (10) and (11), the density / writes as the product of the 
conditional density of each variable X\ given its parents in Qf u ii- From Propo- 
sition 1, the probability distribution P admits a BN representation according 

tO Gfull- ■ 

Lemma 2 Assume the joint probability distribution P of process X has den- 
sity f with respect to Lebesgue measure on M. pxn . If P factorizes according 
to two different subgraphs of Qf u u, Q\ and Q2, then P factorizes according to 

From Lemma 2, it is straightforward that, among the DAGs 
included in Qf u u, there exists a minimal DAG (denoted by Q in the 
paper) according to which the probability distribution P factorizes, 
thus establishing a BN representation of process X. 
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Proof of Lemma 2. Consider a discrete-time stochastic process X = 
{X\; i G P, t G iV} whose joint probability P distribution has the density / 
with respect to Lebesgue measure on R pXTl . 

Let Qi and Q 2 be two different subgraphs of Qf u u according to which the 
joint probability distribution P factorizes. Let i G P, t G N, we consider the 
random variable X\. 
We denote as follows, 

• the following subsets of P, 

pai = {j eP;Xl 1 e P a(Xi,g 1 )} 

Wi = P\{pai} 

pa 2 = {j eP;X{_ 1 epa(XlQ 2 )} 
P«2 = P\{pa2} 

• and the densities of the joint or marginal probability distributions of 

(Xl,X t -i), 

g : W +1 — > R the density of the joint probability distribution of (XI, X t -i) 
g l the density of the probability distribution of X\, 



g the density of the joint probability distribution of (X t 



t-ij 



g l 'P ai the density of the joint probability distribution of (XI, Xf^\) where, 
g l 'P a 2 the density of the joint probability distribution of (X\, Xf_l) where 
etc... 

In the following, y G R, x = (x\,...,x p ) G W and we denote by x pai = 
{xj-J G pai} G M |pai1 (Thus x = (x pai ,x Wl ) = (x pa2 ,Xpa 2 ) G W). As the 
probability distribution P factorizes according to Q\, we derive from the DAG 
theory the conditional independence, 



t 

that is, 



xi±xn\xn, 



Wy G R, Va; G 



g(y,x) _ g i,pai (v,x pai ) 



g p (x) 9 pai (x pai ) 
Equivalent results derived from the factorization according to Q 2 gives, 



gl,P0.1 (y^ x 



\/y G R, x G W, N g^(y, x pm ) = * J?'" p ?' g paa ' 



g Pai ^ 



x, 



PCL2, 



pai, 
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By taking the integral with respect to x pa2np -a v we write for all y G R, for all 

paiUpa 2 



r (Z TO|paiUpa 2 | 



' P 2 (y-i%pa2)d'{Xp a2 r l pa i ) f , \ fl ,P 2 {%pa2)d (%pa2C\pa 1 

J 9 \%pai ) 

n^P a l( V r ) 

i,pa 1 r\pa 2 ( 1 . „ \ _ y KM' -^pai) n painpa 2 („ \ 

y \.y i ■ ij pa\C\pa,2 } pa\l \ y \ Jj paiC\pa,2l 

9 \%pa\ ) 

Finally we have, 

g(y,x) _ g l ' painpa2 (y,x painpa2 ) 



\/y e R, Vx e 



g p {x) gP a ^{x painpa2 ] 



that is the conditional density of the probability distribution of X\ given X t -\ 
is the conditional density of the probability distribution of X\ given xf®\ npa2 . 
Then P factorizes according to Q\ n Q%. ■ 

Lemma 3 ( Conditional independence between non adjacent suc- 
cessive variables ) Let Q be a subgraph of Qf u u according to which the prob- 
ability distribution P admits a BN representation. For any pair of successive 
variables (Xj_ l5 JQ) which are non adjacent in Q, we have 

X\ ± Xi_ x I pa(X l t , Q) and X\ ± X 3 t _ x \ pa(X l t , Q) U S, 

for all S subset of {X^ k G P,u <t}. 

As an illustration of Lemma 3, assume P admits a BN representation ac- 
cording to the DAG of Figure 1C. There is no edge between Xf and X^ +l in 
this DAG. Now consider in Figure 2B the moral graph of the smallest ances- 
tral graph containing Xf, X} +1 and the parents (X},Xf) of X} +1 . The set 
(X^X?) blocks all paths between Xf and X} +1 . From Proposition 2, we have 

xl +1 ±xf\pa(xl +11 g). 

Proof of Lemma 3. Assume P admits a BN representation according 
to Q, a subgraph of Qf u u. Let X\_ x and X\ be two non adjacent vertices 
of Q (there is no edge between them in Q) and consider the moral graph 
(^An(x i \jx j vjpa(x i g))) m °^ ^he sma Uest ancestral set containing the variables 
XI, X\_ x and the parents pa(X l t ,Q) of X\ in Q. As DAG Q is a subgraph of 
Gfidh ^e set of parents pa{X\, Q) blocks all paths between and X\ in the 
moral graph {Q An^vjx 3 upa(x i s)))" 1- From Proposition 2, this establishes the 

conditional independence X\ X X\_ x \ pa(Xl,Q). 

This result holds for the conditioning according to any subset S C {X^; k G 
P,u<t}. m 
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A. 2 Proof of Propositions 3, 5, 6 and 7 

Proof of Proposition 3. First, we show that P admits a BN representation 
according to Q. Let i,jEP such that X\ X X 3 _ x \X t ^ x , then we have, 

f(X}\X t _ 1 ) = f(Xl\x2 1 ). 

Under Assumptions 1 and 2, from Lemma 1 (See Appendix A.l) and Prop. 1, 
P admits a BN representation according to the DAG (X, E(Qf u u) \ (X 3 _ x , X\)) 
which has the edges of Qf u u except for the edge (X 3 _ x , X l t ). This holds for any 
pair of successive variables that are conditionally independent. 

From Lemma 2 (See Appendix A.l), P admits a BN representation ac- 
cording to the intersection of the DAG (X, E{Qf u u) \ (X 3 t _ x ,X l t )) for any pair 
{XlX 3 t _ x ) such that X\ X X 3 _ X \X^ X , that is DAG Q. 

Also, DAG Q cannot be reduced. Indeed, let (X l t _ x , X^) be an edge of Q 
and assume that P admits a BN representation according to G\(Xj._ x , X[ r), that 
is DAG Q with the edge {X l t _ 1 , X* ) removed. From Lemma 3 (Appendix A.l), 
we have X* X X l t _ x \X^ x , which contradicts (X l t _ x ,Xf) e V{Q) (i.e. X* X 
X\_ x \X t J: x ). 



Proof of Proposition 5. 

First, from Corollary 1, Q D 

Second, let X be a Gaussian process and (X 3 _ x , X % t ) G E(Q), then according 
to Proposition 3, X\ X X\_ x \ X^_ x . Since X is Gaussian, this implies 

Cov{xlxu\x*i x )^o. 

Now assume that there exists k ^ j, such that X\ X X 3 _ x \ X\_ x ie 
(X 3 _ x ,Xl) ^ E(Q^). We are going to prove that this contradicts the nullity 
of covariance Cov (XI, X 3 t _ x \X^ x ) ^ 0. 

Let / be an element of P\{j, k}. The conditional covariance Cov(ij\k, I) = 
Cov(Xl, X 3 _ x \Xt_ x , X\_ x ) can be written, 



-A' 



Cov(ij\k,l) = Cov(Xl,XL x \xu 



■ Cov(X},X 3 _ x \Xt x ) 



Cov(XlX\_ x 


Xl x )Cov(XU,X l t _ x 




Var(Xl x 





(Cov(X 3 _ x ,X l t _ x 




Var{Xl x 


X^_ x )Var 




xU 



Cov(Xi x ,X l t _ x 


Xl x )Cov 


(Xi,X l t _ x 


Xt_ x ,X 3 t _ x ) 


Var{X l t _ x 


xl x ) 
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However both terms in the latter expression of Cov(ij\k, I) are null: 

• since X\ ± Xf_ x \ X?_ 1: then Cov^X^ = 0, 

• as N^ ax (Q) < 1, X\_ x is the only parent of X\ in Q. So the vari- 
able X\_ x and thus also the set (X|_ x ,X t fe _i) blocks all paths between 
X\_ x and X\ in the moral graph of the smallest ancestral set contain- 
ing X\ U X J t *i l . Then we have, X\ ± X\_ x \ {Xf^X^}, that is 
Cov(Xi,XU \XiLi,xU) = 0- 

Then Cov(ij\k,l) = 0. By induction, we obtain Cov(Xl,Xj_ 1 \X t _t 1 ) = 
leading to a contradiction with (X^^Xf) E E{Q). Therefore (X^X}) E G {1) 
and we have Q C 
■ 

Proof of Prop 6 . 

Let {X 3 t _ x ,Xl) E E{G). Assume that (X^Xf) <£ E{Q^) then there exists 
a subset of q variables X^_ x with respect to which X\_ x and X\ are conditionally 
independent. From faithfulness, the subset Xf_ x separates X 3 t _ x and X\ in the 
moral graph of the smallest ancestral set containing X\ U X{_ x U Xf_ v This 
contradicts the presence of the edge (X/_ 1; X f J ) in Q. ■ 

Proof of Prop 7 . 

From faithfulness, Q C Q^ q \ Then for all i in P, for all t > 1, we have 

N pa (xig) < N pa (xi,gM) < q . 

From Proposition 4, (X^Xf) E(Q) (X^Xf) (£ E(Q®), that is 
(XU,Xt)E E{G®) =► (XUXi)E E{Q). 
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